Gradient Temporal Difference Networks
نویسنده
چکیده
Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive representation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop a true gradient descent algorithm that takes account of all three roles performed by each node in the network: as state, as an answer, and as a target for other questions. Our algorithm combines gradient temporal-difference learning (Maei et al., 2009) with real-time recurrent learning (Williams and Zipser, 1994). We provide a generalisation of the Bellman equation that corresponds to the semantics of the TD network, and prove that our algorithm converges to a fixed point of this equation.
منابع مشابه
Multi-step Predictions Based on TD-DBP ELMAN Neural Network for Wave Compensating Platform
The gradient descent momentum and adaptive learning rate TD-DBP algorithm can improve the training speed and stability of Elman network effectively. BP algorithm is the typical supervised learning algorithm, so neural network cannot be trained on-line by it. For this reason, a new algorithm (TDDBP), which was composed of temporal difference (TD) method and dynamic BP algorithm (DBP), was propos...
متن کاملModular SRV Reinforcement Learning Architectures for Non-linear Control
This paper demonstrates the advantages of using a hybrid reinforcement–modular neural network architecture for non-linear control. Specifically, the method of ACTION-CRITIC reinforcement learning, modular neural networks, competitive learning and stochastic updating are combined. This provides an architecture able to both support temporal difference learning and probabilistic partitioning of th...
متن کاملConvergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximat...
متن کاملIntelligent Optimization of a Mixed Culture Cultivation Process
In the present paper a neural network approach called “Adaptive Critic Design” (ACD) was applied to optimal tuning of set point controllers of the three main substrates (sugar, nitrogen source and dissolved oxygen) for PHB production process. For approximation of the critic and the controllers a special kind of recurrent neural networks called Echo state networks (ESN) were used. Their structur...
متن کاملHandwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کامل